-
Notifications
You must be signed in to change notification settings - Fork 436
Prevent HTLC double-forwards, prune forwarded onions #4303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Prevent HTLC double-forwards, prune forwarded onions #4303
Conversation
|
👋 Thanks for assigning @joostjager as a reviewer! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4303 +/- ##
==========================================
- Coverage 86.10% 86.08% -0.02%
==========================================
Files 156 156
Lines 102610 102815 +205
Branches 102610 102815 +205
==========================================
+ Hits 88354 88511 +157
- Misses 11762 11807 +45
- Partials 2494 2497 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
01aa4c0 to
4012864
Compare
4012864 to
e2ea1ed
Compare
e2ea1ed to
c40741b
Compare
|
Pushed a rebase on |
We recently added support for reconstructing ChannelManager::decode_update_add_htlcs on startup, using data present in the Channels. However, we failed to prune HTLCs from this rebuilt map if a given inbound HTLC was already forwarded to the outbound edge and in the outbound holding cell. Here we fix this bug that would have caused us to double-forward inbound HTLC forwards, which fortunately never shipped. Co-Authored-By: Claude Opus 4.5 <[email protected]>
We recently added support for reconstructing ChannelManager::decode_update_add_htlcs on startup, using data present in the Channels. However, we failed to prune HTLCs from this rebuilt map if a given HTLC was already forwarded+removed from the outbound edge and resolved in the inbound edge's holding cell. Here we fix this bug that would have caused us to double-forward inbound HTLC forwards, which fortunately was not shipped. Co-Authored-By: Claude Opus 4.5 <[email protected]>
c40741b to
3f0f811
Compare
In 0.3+, we are taking steps to remove the requirement of regularly persisting
the ChannelManager and instead rebuild the set of HTLC forwards (and the
manager generally) from Channel{Monitor} data.
We previously merged support for reconstructing the
ChannelManager::decode_update_add_htlcs map from channel data, using a new
HTLC onion field that will be present for inbound HTLCs received on 0.3+ only.
However, we now want to add support for pruning this field once it's no longer
needed so it doesn't get persisted every time the manager gets persisted. At
the same time, in a future LDK version we need to detect whether the field was
ever present to begin with to prevent upgrading with legacy HTLCs present.
We accomplish both by converting the plain update_add option that was
previously serialized to an enum that indicates whether the HTLC is from 0.2-
versus 0.3+-with-onion-pruned.
Actual pruning of the new update_add field is added in the next commit.
We store inbound committed HTLCs' onions in Channels for use in reconstructing the pending HTLC set on ChannelManager read. If an HTLC has been irrevocably forwarded to the outbound edge, we no longer need to persist the inbound edge's onion and can prune it here.
In the last commit, we added support for pruning an inbound HTLC's persisted onion once the HTLC has been irrevocably forwarded to the outbound edge. Here, we add a check on startup that those inbound HTLCs were actually handled. Specifically, we check that the inbound HTLC is either (a) currently present in the outbound edge or (b) was removed via claim. If neither of those are true, we infer that the HTLC was removed from the outbound edge via fail and fail the inbound HTLC backwards. Tests for this code are added in a follow-up PR that will be merged in 0.5. We can't test this code right now because of reconstruct_manager_from_monitors logic is needed, and whether it runs during tests is chosen randomly.
In 0.3+, we are taking steps to remove the requirement of regularly persisting
the ChannelManager and instead rebuild the set of HTLC forwards (and the
manager generally) from Channel{Monitor} data.
We previously merged support for reconstructing the
ChannelManager::decode_update_add_htlcs map from channel data, using a new
HTLC onion field that will be present for inbound HTLCs received on 0.3+ only.
The plan is that in upcoming LDK versions, the manager will reconstruct this
map and the other forward/claimable/pending HTLC maps will automatically
repopulate themselves on the next call to process_pending_htlc_forwards.
As such, once we're in a future version that reconstructs the pending HTLC set,
we can stop persisting the legacy ChannelManager maps such as forward_htlcs,
pending_intercepted_htlcs since they will never be used.
For 0.3 to be compatible with this future version, in this commit we detect
that the manager was last written on a version of LDK that doesn't persist the
legacy maps. In that case, we don't try to read the old forwards map and run
the new reconstruction logic only.
3f0f811 to
fce5071
Compare
|
Haven't reviewed yet, but the following question came up: in the previous pr and in this pr, double-forward bugs are fixed. Couldn't the fuzzer detect this? |
joostjager
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic in this PR is quite complex, with multiple interacting state machines (inbound HTLC state, outbound HTLC state, holding cells, monitor updates) that need to stay consistent across restarts. The fact that multiple double-forward bugs were discovered during development suggests the state space is large enough that targeted unit tests may not provide sufficient coverage.
I keep wondering if the existing fuzzer can exercise these restart scenarios with in-flight HTLCs at various stages. The current strategy of adding a specific regression test might be too reactive?
| WithOnion { update_add_htlc: msgs::UpdateAddHTLC }, | ||
| /// This inbound HTLC is a forward that was irrevocably committed to the outbound edge, allowing | ||
| /// its onion to be pruned and no longer persisted. | ||
| Forwarded { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this commit could've been split in two. First introducing the enum with WithOnion and Legacy, and then adding Forwarded
| /// Useful for reconstructing the pending HTLC set on startup. | ||
| #[derive(Debug)] | ||
| enum InboundUpdateAdd { | ||
| #[derive(Debug, Clone)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests for this code are added in a follow-up PR that will be merged in 0.5. We
can't test this code right now because of reconstruct_manager_from_monitors
logic is needed, and whether it runs during tests is chosen randomly.
We talked about this before, but reading this again, I keep wondering if you can't add a deterministic control lever for reconstruct_manager_from_monitors so that you can already test the code without waiting for 0.5?
| // | ||
| // If 0.3 or 0.4 reads this manager version, it knows that the legacy maps were not written and | ||
| // acts accordingly. | ||
| const RECONSTRUCT_HTLCS_FROM_CHANS_VERSION: u8 = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't you want to make this version 10, to be able to insert other upgrades in between?
| // We don't want to return an HTLC as needing processing if it already has a resolution that's | ||
| // pending in the holding cell. | ||
| let htlc_resolution_in_holding_cell = |id: u64| -> bool { | ||
| self.context.holding_cell_htlc_updates.iter().any(|holding_cell_htlc| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The htlc_resolution_in_holding_cell closure iterates the holding cell for each HTLC. Is it worth putting it in a set?
| Forwarded { | ||
| /// Useful if we need to fail or claim this HTLC backwards after restart, if it's missing in the | ||
| /// outbound edge. | ||
| hop_data: HTLCPreviousHopData, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is some data in here that is redundant with InboundHTLCOutput. Not sure if we care
| mem::take(&mut already_forwarded_htlcs).drain(..) | ||
| { | ||
| if hash != payment_hash { | ||
| already_forwarded_htlcs.push((hash, hop_data, outbound_amt_msat)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a different way to do this other than drain and push back again? Maybe filter or retain or something?
| pub(super) fn prune_inbound_htlc_onion( | ||
| &mut self, htlc_id: u64, hop_data: HTLCPreviousHopData, outbound_amt_msat: u64, | ||
| ) { | ||
| for htlc in self.context.pending_inbound_htlcs.iter_mut() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log or assert if not found?
| if (decode_update_add_htlcs_legacy.is_some() || pending_intercepted_htlcs_legacy.is_some()) | ||
| && ver >= RECONSTRUCT_HTLCS_FROM_CHANS_VERSION | ||
| { | ||
| return Err(DecodeError::InvalidValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe log this specific case too?
ChannelManagerforward maps and fully reconstruct them fromChanneldata. See the follow-up tagged 0.5: (0.5) Deprecate legacy pending HTLCChannelManagerpersistence #4359Closes #4280, partially addresses #4286
Based on #4289Channels in prod based on the serialized manager version Fix double-forward, prefer legacy forward maps #4289 (comment)